[5.7] [DNM] Cherry-pick batch test PR #443

hamishknight · 2022-05-27T10:36:37Z

Batch test PR against the Swift repo for:

Rename various APIs

Move options from RegexComponent to Regex

Remove the DSL -> _CharacterClassModel conversion, and _CharacterClassModel's custom character class matching logic, none of which is being used.

`makeDSLTreeCharacterClass` was the last API that required it to be public. Remove it, and replace it with some static members on `_AST.Atom`.

Map to `.newlineSequence` instead of `.newline`, which allows it to create the correct consumer. rdar://96330096

Explicitly disambiguate the fact we're talking about `.`, which does not match newlines unless in single line mode.

This time as a "true any" that matches any character, including newlines.

This should map to `.any`, not `.dot`. rdar://96509234

This enum will start including cases that only the DSL can use, so move it off the AST.

Introduce `startOfInput` and `endOfInput` assertion kinds, and map the DSL to them such that they do not depend on matching options. rdar://97029630

rdar://97029702

…ftlang#560) This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)

) - Adds new instructions for matching characters and scalars case insensitively - Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode - Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings - Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)

This allows us to catch the case where a match occurs without optimizations, but doesn't occur with optimizations. Additionally fix the `xfail` param such that it can't be used on tests that actually match expectations.

Replace a couple of `#if os(Linux)` checks with a check to see if we have a newer stdlib available. This lets us emit an expected failure in the case where we're testing on an older stdlib.

Previously we performed a lexicographic comparison with the bounds of a character class range. However this produced surprising results, and our implementation didn't properly handle case sensitivity. Update the logic to instead only allow single scalar NFC bounds. The input is then converted to NFC in grapheme semantic mode, and checked against the range. In scalar semantic mode, the input scalar is checked on its own. Additionally, fix the case sensitivity handling such that we check both the lowercase and uppercase version of the input against the range.

Previously we would emit a series of scalars written in the DSL as a series of individual characters in grapheme semantic mode. Change the behavior such that we coalesce any adjacent scalars and characters, including those in regex literals and nested concatenations. We then perform grapheme breaking over the result, and can emit character matches for scalars that coalesced into a grapheme. This transform subsumes a similar transform we performed for regex literals when converting them to a DSLTree. This has the nice side effect of allowing us to better preserve scalar syntax in the DSL transform. rdar://96942688

Previously we would only match entire characters. Update to use the generic Character consumer logic that can handle scalar semantic mode. rdar://97209131

In grapheme semantic mode, coalesce adjacent character and scalar members of a custom character class, over which we can perform grapheme breaking. This involves potentially re-writing ranges such that they contain a complete grapheme of adjacent scalars.

Make sure we throw the right error for ranges that are invalid in grapheme mode, but are valid in scalar mode.

I also noticed that `lexQuantifier` could silently eat trivia if it failed to lex a quantification, so also fix that.

hamishknight · 2022-07-21T19:21:27Z

@swift-ci please test

stephentyrone · 2022-07-22T17:09:35Z

Hamish, we can close this one out now, right?

hamishknight · 2022-07-22T17:10:44Z

Yeah

hamishknight mentioned this pull request May 27, 2022

[5.7] [DNM] Null PR swiftlang/swift#42532

Closed

hamishknight force-pushed the 5.7-test-queue branch from 37fa869 to 3e6562b Compare May 30, 2022 10:30

hamishknight force-pushed the 5.7-test-queue branch from 3e6562b to 6765950 Compare July 6, 2022 10:10

hamishknight mentioned this pull request Jul 6, 2022

[5.7] [test] Update Regex type for flattened captures swiftlang/swift#59912

Merged

hamishknight force-pushed the 5.7-test-queue branch from 6765950 to 1dfb1a6 Compare July 6, 2022 13:06

Azoy and others added 25 commits July 21, 2022 20:13

Merge pull request swiftlang#575 from Azoy/various-tidbits

863aebe

Rename various APIs

Merge pull request swiftlang#576 from Azoy/options-regex

1079533

Move options from RegexComponent to Regex

Allow matching tests to specify semantic level

5966a5c

Rip out unused _CharacterClassModel API

fe63fb4

Remove the DSL -> _CharacterClassModel conversion, and _CharacterClassModel's custom character class matching logic, none of which is being used.

Remove _CharacterClassModel conformance to RegexComponent

b309fa5

Internalize _CharacterClassModel

0ab3079

`makeDSLTreeCharacterClass` was the last API that required it to be public. Remove it, and replace it with some static members on `_AST.Atom`.

Fix CharacterClass.newlineSequence

b454390

Map to `.newlineSequence` instead of `.newline`, which allows it to create the correct consumer. rdar://96330096

Rename any -> dot

8e920c9

Explicitly disambiguate the fact we're talking about `.`, which does not match newlines unless in single line mode.

Re-introduce DSLTree.Atom.any

d6a03a0

This time as a "true any" that matches any character, including newlines.

Fix CharacterClass.any

da59c30

This should map to `.any`, not `.dot`. rdar://96509234

Rename startOfLine/endOfLine -> caretAnchor/dollarAnchor

217aef4

Move AssertionKind onto the DSL

dff47ff

This enum will start including cases that only the DSL can use, so move it off the AST.

Fix Anchor.startOfLine and Anchor.endOfLine

1b3ba2c

Introduce `startOfInput` and `endOfInput` assertion kinds, and map the DSL to them such that they do not depend on matching options. rdar://97029630

Add some tests for CharacterClass.anyGraphemeCluster

0570133

Add some tests for CharacterClass.horizontalWhitespace

c7b42f8

Implement CharacterClass.anyNonNewline

47888e6

rdar://97029702

Validate optimizations when a match fails

33a937c

This allows us to catch the case where a match occurs without optimizations, but doesn't occur with optimizations. Additionally fix the `xfail` param such that it can't be used on tests that actually match expectations.

Guard against testing with older stdlibs

e343554

Replace a couple of `#if os(Linux)` checks with a check to see if we have a newer stdlib available. This lets us emit an expected failure in the case where we're testing on an older stdlib.

Add some extra character class newline matching tests

1acb82a

Fix scalar mode for quoted sequences in character class

b61c770

Previously we would only match entire characters. Update to use the generic Character consumer logic that can handle scalar semantic mode. rdar://97209131

Form ASCII bitsets for quoted sequences in character classes

bda6fbc

hamishknight added 3 commits July 21, 2022 20:19

Throw RegexCompilationError for invalid character class bounds

d5cad1c

Make sure we throw the right error for ranges that are invalid in grapheme mode, but are valid in scalar mode.

Allow coalescing through trivia

f2d44ff

I also noticed that `lexQuantifier` could silently eat trivia if it failed to lex a quantification, so also fix that.

hamishknight force-pushed the 5.7-test-queue branch from 1dfb1a6 to f2d44ff Compare July 21, 2022 19:21

hamishknight mentioned this pull request Jul 21, 2022

[5.7] [DNM] Null PR (2) swiftlang/swift#60183

Closed

hamishknight closed this Jul 22, 2022

hamishknight deleted the 5.7-test-queue branch July 22, 2022 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[5.7] [DNM] Cherry-pick batch test PR #443

[5.7] [DNM] Cherry-pick batch test PR #443

Uh oh!

hamishknight commented May 27, 2022 •

edited

Loading

Uh oh!

hamishknight commented Jul 21, 2022

Uh oh!

stephentyrone commented Jul 22, 2022

Uh oh!

hamishknight commented Jul 22, 2022

Uh oh!

Uh oh!

[5.7] [DNM] Cherry-pick batch test PR #443

[5.7] [DNM] Cherry-pick batch test PR #443

Uh oh!

Conversation

hamishknight commented May 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishknight commented Jul 21, 2022

Uh oh!

stephentyrone commented Jul 22, 2022

Uh oh!

hamishknight commented Jul 22, 2022

Uh oh!

Uh oh!

hamishknight commented May 27, 2022 •

edited

Loading